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Abstract 

Background: Oligonucleotide microarray-based comparative genomic hybridization (CGH) offers an attractive 
possible route for the rapid and cost-effective genome-wide discovery of deletion mutations. CGH typically involves 
comparison of the hybridization intensities of genomic DNA samples with microarray chip representations of entire 
genomes, and has widespread potential application in experimental research and medical diagnostics. However, 
the power to detect small deletions is low. 

Results: Here we use a graduated series of Arabidopsis thaliana genomic deletion mutations (of sizes ranging from 
4 bp to ~5 kb) to optimize CGH-based genomic deletion detection. We show that the power to detect smaller 
deletions (4, 28 and 104 bp) depends upon oligonucleotide density (essentially the number of genome- 
representative oligonucleotides on the microarray chip), and determine the oligonucleotide spacings necessary to 
guarantee detection of deletions of specified size. 

Conclusions: Our findings will enhance a wide range of research and clinical applications, and in particular will aid 
in the discovery of genomic deletions in the absence of a priori knowledge of their existence. 
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Background 

Oligonucleotide microarrays were first developed -20 years 
ago [1]. Present day microarrays are vastly superior to their 
predecessors in terms of quality, probe density, and they 
can represent an entire species genome [2-4]. Microarray 
technology, together with advances in genomics and bio- 
informatics methodologies, revolutionized the ways that 
we interrogate and study genomes. These approaches have 
great power because they allow simultaneous survey and 
profiling of thousands of genes, and enable whole genomes 
to be assayed at particular moments in time and under 
specific conditions. 

The scientific applications of microarray technology 
range from gene expression profiling, comparative gen- 
omic hybridization (CGH), and chromatin immunoprecip- 
itation analysis, to single nucleotide polymorphism (SNP) 
detection. Recently, next generation sequencing (NGS) 
has also been used as a major discovery tool in these 
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applications, particularly where well-annotated whole gen- 
ome datasets are available [5]. However, for species with 
large, complex (e.g., transposon-rich [6]), polyploid ge- 
nomes, or for species with genomes that are not well an- 
notated, NGS is relatively poorly suited to the detection of 
SNPs, insertions/deletions (INDELs) and other variants 
because of the short sequencing reads and depth of se- 
quencing coverage needed [7-9]. Moreover, the computa- 
tional overheads associated with the analysis of NGS data 
are a significant barrier to their use in some laboratories. 
By contrast, microarrays are well-established research 
tools that require only well-established analysis methods. 
These can be performed easily on personal computers, 
and allow rapid and routine analysis of multiple samples. 
Hence, whilst the use of microarrays and NGS in genomic 
analysis will likely continue to be complementary [10], we 
here describe work specifically aimed at optimizing gen- 
omic deletion mutation detection via microarray-based 
approaches. 

The first discovery of a plant genomic deletion mutation 
using microarray-based CGH utilized the Arabidopsis 
thaliana ATHl genome array, a single-chip array featuring 
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22,500 probes representing approximately 24,000 A. thali- 
ana gene sequences. CGH using this array led to the 
discovery of a phenotype-causal 523 bp fast-neutron (FN) 
irradiation-induced deletion mutation [11]. This deletion 
was within the second exon and intron of the A, thaliana 
AtHKTl gene (A^4G 10310) in a previously identified ion 
accumulation mutant [11]. Up to that point the identifica- 
tion of mutations that caused phenotypic changes in plants 
required laborious map-based cloning [12]. 

Mutagens such as FNs frequently induce small gen- 
omic DNA deletions of 1-6 bp [13] and permit the iden- 
tification of phenotype-causal mutations in an unbiased 
manner versus mutagens such as EMS that almost ex- 
clusively cause G:C>A:T point mutations [12,14-17]. 
The use of tiling microarrays to discover these causal 
mutations potentially reduces the time, cost and labour 
needed, important when large numbers of samples are 
analysed [4,11]. However, mutation discovery remains a 
difficult process, especially with respect to relatively 
small-sized deletions. 

In this study we describe the use of a graduated series 
of previously characterized A, thaliana genomic deletion 
mutations to discover the parameters required for rapid 
and robust detection of genomic deletion mutations, and 
in particular of relatively small sized deletion mutations. 
This series presents a range of deletion mutations, from 
4 bp to ~ 5 kb, thus extending over a four orders of mag- 
nitude difference in deletion size. We use standard and 
customized versions of the Roche NimbleGen 2.1 million 
feature high-resolution microarrays [18] to detect these 
deletions via CGH, and are hence able to determine the 
oligonucleotide densities necessary to robustly detect gen- 
omic deletions as small as 4 bp in size. We show that this 
technology has the potential to uncover deletions that 
were not detectable with previous array based designs 
and equalling NGS-based technologies. Our findings 
may improve microarray design for a wide variety of ap- 
plications [18]. 

Results 

Arabidopsis oligonucleotide microarray design 

The Roche NimbleGen custom A. thaliana CGH 3 x 
720 K genome arrays can feature up to 2.16 million 
probes. The custom array that we designed featured the 
2.09 million probes designed to the Arabidopsis TAIR8 
annotated reference genome (www.arabidopsis.org) that 
represents 27,235 protein coding genes, 4,759 pseudogenes 
or transposable elements and 1,288 non-coding RNAs 
(33,282 genes in total and 38,963 gene models) at 49 bp 
staggering (the standard' array). In addition, -15,000 
ultra-high density probes were added to the arrays to rep- 
resent the genes AtGAl, AtHKTl, AtPHYB, AtHYl and 
AtMAX2 and flanking genomic sequences (see Methods, 
Additional file 1: Tables SI and S2). The single ultra-high 



density' probe set representing the larger deletions 
(> 500 bp in the mutant lines gal-S and FN1148) were 
staggered every 6 bp apart (Additional file 1: Figure SI). 
Nineteen ultra-high density' probe sets were designed to 
represent the smaller deletions (< 104 bp) in the mutant 
lines E124, E99, and E207 and were staggered every: 2, 6, 
10, 12, 15, 17, 20, 22, 25, 27, 30, 32, 35, 37, 40, 42, 45, 47 
and 49 bp (Additional file 1: Figure SI). These additional 
features allowed us to determine the probe densities re- 
quired for effective detection of the deletions of 4 bp to 
more than 5 kb present in the plant lines gal-S, FN1148, 
E99, El 24 and E207 (see Figure 1). The genomic deletions 
in E99, El 24 and E207 were previously confirmed by dir- 
ect PCR amplification (Additional file 1: Figure S2) and 
Sanger sequencing (Additional file 1: Table S3). 

The arrays used in this study are unlike traditional til- 
ing array formats that feature non- overlapping or par- 
tially overlapping probes with a maximum of one probe 
overlapping another. For example, in this study, the stag- 
gered probes designed every 2 bp can cover a single 
nucleotide position in the Arabidopsis genome up to 37 
times, versus the maximum of twice using conventional 
tiling arrays. Probes containing sequences represented 
more than once in the genome were omitted from the 



ga1-3 



ATG 

b II ■ m il I I ■ 



-I-+ 



TAG 

gal 



5,051 bp 



FN1148 

ATG 





TAA 







E124 



523 bp 



ATG 



104 bp 



E99 



TAG 



TGA 



-- hkt1 



phyB 




28 bp 



E207 



ATG 



TGA 



hyl 



max2 



4 bp 

Figure 1 A schematic showing the positions of five genomic 
deletions in five mutant plant lines. The DNA deletions in 
mutants: gal -3, FN 1 148, El 24, E99 and E207 confer gal hktl, phyB, 
hyl and max2 mutant alleles, respectively. Green boxes represent 
exons, blue lines introns, dashed black lines inter coding sequences, 
and the start (ATG) and stop codons (TAG, TAA or TGA) for each 
gene are shown. The genomic deletion in each gene is shown as a 
red box with the size in bp indicated. The genes are not shown 
to scale. 
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final array design, and all probes in the ultra-high dens- 
ity' probe sets were unique within the TAIR8 annotated 
reference genome. The genomic distribution profiles of 
the ultra-high density' probes, like standard tiling arrays 
including the NimbleGen CGH arrays, are essentially 
linear and unbiased unlike most transcript arrays which 
are biased towards the 3' end of transcripts and some- 
times overlap [19]. This is because sequence data are 
typically derived from EST sequences with a 3 ' bias, and 
the 3' ends of genes are generally more variable and 
provide greater specificity [20]. 

Microarray-based discovery of deletion mutations 

Use of DNA (rather than mRNA) for microarray-based 
deletion cloning strategies has a number of benefits, as 
the mRNA abundance of many genes is too low to be ef- 
ficiently and reliably detected. In addition, the relative 
levels of mRNA could be altered due to the secondary 
effects of the deletion mutation [11]. DNA from mutant 
plant lines or control lines labelled with Cy3 or Cy5 
were co-hybridized to the CGH arrays. Genome-wide 
mutant DNA hybridization intensities were normalized 
with control genomic DNA samples hybridized to the 
same CGH array. Homotypic hybridizations (self-self: the 
same nucleic acid samples labelled with two fluorophores 
and hybridized to a single microarray) should result in a 
slope of Cy3 versus Cy5 intensity equal to one where nor- 
malized (Additional file 1: Figure S3). However, a number 
of putative deletions and/or copy number variants (CNVs) 
are always observed in such experiments as judged by a 
lack of hybridization intensity, or poor hybridization inten- 
sity of either the sample or control. Reduced hybridization 
signals are likely due to experiment-to-experiment variabil- 
ity and variation in the hybridization efficiency between in- 
dividual probes [21,22]. 

To detect the two largest deletions in our graduated 
mutant collection, gal-3 (--5 kb) and FN1148 (523 bp), 
we used a single probe staggering of 6 bp over each gene 
affected (in addition to the NimbleGen standard' probes 
staggered every 49 bp) (Additional file 1: Figure SI). Both 
deletions were observable at a genome-wide level (see 
Figure 2 gal-3 and FN1148 profiles). The larger ~5 kb de- 
letion in gal '3 at the GAl locus was visible as an obvious 
'deletion signal' on chromosome 4 (Figure 2 gal-3 profile). 
This deletion signal consisted of a set of neighbouring 
probes containing the GAl gene sequence that hybridized 
with control Ler DNA but failed to hybridize with gal-3 
DNA, and matched exactly the previously established size 
and location of the gal-3 deletion [23]. The smaller dele- 
tion of 523 bp encompassing part of the HKTl gene [11] 
in the FN1148 mutant was also observable on a genome- 
wide scale on chromosome 4 (Figure 2 FN1148 profile). 

Further analysis indicated the existence of seven other 
putative deletions displaying a probe 'deletion profile' 



threshold of 2 x S.D. in the gal-3 mutant genome 
(shown with a Figure 2; also see Additional file 1: 
Figure S4A). To determine if these were genuine dele- 
tions, we designed oligonucleotides to a gene within each 
deleted region and performed diagnostic PCR analyses. 
PCRs confirmed three of the seven putative deletions were 
false positives (on chromosomes 4 and 5) as the genes 
were amplified from both the Ler control and gal-3 mu- 
tant DNA samples, see Additional file 1: Figure S4B. We 
could not confirm the existence of the four deletions on 
chromosomes 1-3 as PCR products could not be ampli- 
fied from either the control or mutant lines. Interestingly, 
we also observed a potential CNV (duplication) of -4.5 kb 
on chromosome 4 (shown with a +; see Figure 2 gal-3 
profile), although it is alternatively possible that this signal 
represents a deletion event in the Ler control. 

The most striking result of the FN1148 hybridization 
profile (versus control) was that there appeared to be a 
several large (-100 kb to 10 Mb) regions of chromo- 
some 2, 3 and 5 that were deleted but no such deletions 
were observable on chromosomes 1 and 4 (Figure 2 
FN1148 profile and Additional file 1: Figure S4C). Simi- 
lar to the deletions identified in the gal-3 mutant versus 
control above, oligonucleotides were designed to six genes 
within these putative deleted regions and diagnostic PCR 
analyses performed to confirm authenticity. We did iden- 
tify a deletion in the FN1148 mutant on chromosome 2 
of -4.5 kb (a PCR product of -6 kb was amplified from 
the control plant DNA while the FN1148 mutant PCR 
product was -1.5 kb), see Additional file 1: Figure S4D. 
This deletion spanned the entire 4 kb sequence of a trans- 
posable element (A^2G31080). An alternative explanation 
for this deletion profile could be either the natural loss or 
movement of this mobile element in the FN1148 mutant 
genome. PCR analyses showed that the five other genes 
that appeared to be located in large deleted regions on 
chromosomes 2, 3 and 5 were actually present in both the 
control and FN1148 mutant lines, see Additional file 1: 
Figure S4D. These false positives could be associated with 
high signal-to-noise due to non-specific probe hybridiza- 
tions or suboptimal technical aspects of the hybridization 
procedure such as sample labelling or preparation. Also, 
there is a possibility that there were DNA sequence spe- 
cific differences between the A. thaliana FN1148 Col-0 
mutant (obtained from J. Schroeder's lab, UC San Diego, 
USA) and the control A, thaliana Col-0 line that had been 
propagated in our own laboratory since the 1990's. 

At a genome-wide scale, similar deletion profiles to the 
larger deletions (523 bp and -5 kb) were observed for 
other Arabidopsis mutants with smaller deletions (4 bp, 
28 bp and 104 bp). These deletion profiles were composed 
of nineteen staggered probe sets (2, 6, 10, 12, 15, 17, 20, 
22, 25, 27, 30, 32, 35, 37, 40, 42, 45, 47 and 49 bp). Al- 
though the numbers of probes required to detect these 



Belfield et al. BMC Genomics 2014, 15:224 
http://www.bionnedcentral.conn/1 471 -21 64/1 5/224 



Page 4 of 1 3 




Chromosome number 

Figure 2 Arabidopsis CGH array data for mutant lines gal-3, FN1148, E124, E99 and E207. The CGH data is displayed as a single panel log2 
signal ratio (mutant/control) rainbow plot for all probes plotted versus genome position, with each chromosome (1-5) differentiated by vertical 
dashed lines. Strongly negative points (e.g. -2 to -6) represent potential deletion mutations, strongly positive points (e.g. 2 to 6) represent 
potential duplication mutations. The known deletions in each mutant line are highlighted by an arrow and the name of the gene affected. Data 
is normalized and segmented. Putative deletions in gal-3 are shown with a and a confirmed deletion of ~7 kb in E99 is shown with a 'A'. The 
staggered 'ultra-high density' probes covering the larger deletions in the FN 1 148 (-500 bp AtHKTl genie mutation) and go! -3 (~5 kb AtGAl genie 
mutation) Arabidopsis mutants were staggered every 6 bp, while the extra probes representing the smaller deletions in the El 24 (104 bp AtPHYB 
genie mutation), E99 (28 bp AtHYl genie mutation) and E207 (4 bp AtMAX2 genie mutation) Arabidopsis mutants were staggered every 2, 6, 10, 
12, 15, 17, 20, 22, 25, 27, 30, 32, 35, 37, 40, 42, 45, 47 and 49 bp. 



smaller deletions was higher than those for the larger dele- 
tions, the normalized deletion signals visualized at the 
PHYB, HYl and MAX2 loci (in E124, E99 and E207 mu- 
tant lines, respectively), were similar in magnitude to the 
523 bp hktl deletion (FN1148 mutant line) signal log2 ra- 
tio of — 2 but smaller than that observed for the gal -3 
deletion (signal log2 ratio of — 6) (Figure 2). This was an 



interesting observation since the PHYB, HYl and MAX2 
loci DNA deletions varied in size by one to two orders of 
magnitude (the MAX2 deletion mutation was 4 bp in size 
and the PHYB mutation was 104 bp), and as a conse- 
quence the number of probes representing these deletions 
was different. This suggests that there is not a simple rela- 
tionship between deletion size and the ability to detect it. 
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A total of 77 probes partially covered the PHYB deletion 
while the >25 times smaller deleted region of MAX2 was 
represented 20 times in comparison. Also, the number of 
complete full-length probes covering these deletions dif- 
fered: 23 covered the PHYB deletion but there were none 
representing the MAX2 deletion, as the probe lengths 
were 50-75 mer and the size of the deletion just 4 bp. 

Interestingly, we also detected an additional deletion 
mutation in the E99 mutant (versus control) on chromo- 
some 1 using the standard NimbleGen probe set of 
49 bp staggered probes (shown with a A; Figure 2). This 
deletion had a similar normalized signal log2 ratio of — 6 
to the gal -3 deletion mutation. Oligos were designed 
to the flanking regions of this putative deletion and a 
PGR product was amplified (data not shown) from E99 
genomic DNA confirming the deletion of 7,176 bp that 
encompassed three genes: A^1G18075, A^1G18080, and 
A^IGISIOO, which encoded a microRNA, a DNA repair 
exonuclease and a protein of unknown function, respect- 
ively. In addition, the presence of this ~7 kb genomic 
deletion and a number of other mutations including the 



28 bp deletion in the HYl gene, were confirmed by 
whole genome sequencing [13]. 

In-depth analysis of probe densities required to detect 
relatively small genomic deletion mutations 

To investigate the oligonucleotide probe resolutions re- 
quired to detect genomic deletions smaller than those in 
gal-3 (--5 kb) and FN1148 (-0.5 kb) mutant lines, we 
compared the normalized hybridization profiles of indi- 
vidual probe datasets obtained for the El 24 (104 bp 
phyB deletion), E99 (28 bp hyl deletion) and E207 (4 bp 
max2 deletion) mutants. To do this we used nineteen 
probe sets staggered every 2, 6, 10, 12, 15, 17, 20, 22, 25, 
27, 30, 32, 35, 37, 40, 42, 45, 47 and 49 bp (Additional 
file 1: Figure SI) over the affected genes, and determined 
the probe resolutions required to efficiently detect the 
different sizes of deletion. 

Scatter plots of the relative DNA hybridization intensity 
(mutant versus control), showed that the deletion muta- 
tions of 4 bp to 104 bp were observable as obvious dele- 
tion profiles using the lowest staggered probe staggering 
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Figure 3 Visualization of five genomic deletions of varying size in Arabidopsis mutant lines gal-3, FN1 148, El 24, E99 and E207 (gene 
affected in each mutant and deletion sizes are shown). Scatter plots show the relative log2 DNA hybridization intensities (mutant minus 
control) representing genes at 6 bp {GAl and HKTl) or 2 bp {PHYB, HYl and MAX2, see Additional file 1: Figure S5-8) probe spacing. Black dots 
represent probes and the black triangle indicates the location of the deletion within the region that the custom array probes represent. An 
obvious 'deletion signal' (contiguous points with relatively extreme negative log2 ratio values) was detected in all lines. 
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of 2 bp (Figure 3). Likewise, the single staggered probe set 
of 6 bp used to identify the deletions present in the gal -3 
and FN1148 mutant lines were also observable as obvious 
deletion profiles (Figure 3). However, the cost of produ- 
cing microarrays with oligonucleotides staggered every 
2 bp (25-fold higher than that of the standard 49 bp 
NimbleGen CGH array) or 6 bp would be prohibitive, es- 
pecially for species with large genomes. 

In order to determine the density of probes required 
to detect smaller DNA deletions, we used a sliding win- 
dow approach of two or three consecutive probes that 
had deletion profiles. To do this, the number of probes 
that either partially or fully covered the 4 bp to 104 bp de- 
letions (termed 'Designed' probes), were compared to the 
number of probes following hybridization that had a dele- 
tion profile (termed 'Detected' probes). For example, of 



the 294 probes that were designed to span the whole 
PHYB gene with 17 bp staggered probes (Additional file 1: 
Table S2), only 9 either partially or completely covered the 
104 bp deletion (in the E124 mutant line) (Figure 4A). 
Eight of these probes (88.9%), gave a detectable normal- 
ized deletion signal below the two S.D. threshold of < -0.4, 
(see Methods, Figure 4B, Table 1 and Figure 5A). 

Further analysis was performed for all nineteen of the 
'Designed' and 'Detected' 'ultra-high density' probe sets 
(2 bp to 49 bp) representing the phyB 104 bp deletion (in 
the El 24 mutant line) and the results are listed in Table 1 
and graphically shown in Figure 5A. Using a threshold of 
3 consecutive probes that also had normalized log2 ratio 
intensities of < -0.4 (i.e. a 'deletion profile'), we found that 
probes staggered between 45 bp and the NimbleGen 
'standard' probe density of 49 bp were sufficient to 
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Figure 4 The design and evaluation of probes used to detect a 104 bp genome deletion. (A) A schematic showing the location of the 
104 bp phyB deletion in the El 24 A tholiono mutant line. Also shown, an enlarged view of the 17 bp staggered probes designed to cover the 
deletion region. Nine probes cover the 104 bp deletion either fully (red filled boxes) or partially (black and red filled boxes). (B) A total of eight of 
the nine probes (89%; labelled 2 to 9) that covered the deletion region showed a typical deletion profile (normalized hybridization signal log2 
ratio of less than -0.4 (dashed line)). One probe (number 1) had a log2 ratio value higher (-0.10506) than the -0.4 threshold. 
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Table 1 The design and experimental performance of 
nineteen staggered probe sets used to detect a 104 bp 
genomic DNA deletion in an A. thaliana mutant 



Probe set 
staggering 
(bp) 



Number of probes 

Detected with a 
Designed log2 normalized 
ratio of < -0.4 



%of 
probes with a 
log2 normalized 
ratio of < -0.4 



2 


77 


66 


86 
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25 


22 


88 


10 


15 


14 


93 


12 


12 


12 


100 


15 


10 
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90 


17 
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89 
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7 


88 


22 
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7 


100 


25 
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6 


100 


27 


5 


5 


100 


30 


5 


4 


80 


32 


5 


5 


100 


35 


5 


5 


100 


37 


4 


4 


100 


40 


4 


4 


100 


42 


4 


3 


75 


45 


3 


3 


100 


47 


3 


3 


100 


49 


3 


3 


100 



Table 1 The table shows the staggered design of probes that cover a 104 bp 
phyB deletion In the El 24 A thaliana mutant line. The number of probes In 
each staggered probe set that covered the deletion are shown (column titled 
'Designed'). The number and percent of 'Designed' probes that were 
experimentally detected with normalized log2 Intensities of < -0.4 are 
also shown. 

identify this size of deletion mutation (Figure 5A). We 
conclude that deletions of -100 bp and larger can be 
confidently detected using tiling microarrays designed 
with isothermal probes staggered every 49 bp using a 
search criteria of > 3 consecutive probes that have a de- 
letion profile. Recently, a similar threshold of at least 
three consecutive probes with a normalized log2 ratio 
threshold of > 0.4, has also been successful in identify^ing 
mutations (CNVs) in human patients with drug resistant 
epilepsy [24]. 

To detect smaller deletions of 28 bp (present in the 
E99 mutant line) and 4 bp (present in the E207 mutant 
line), higher probe densities than those used to detect 
the 104 bp above were needed. Figure 5B shows that 
probes would need to be staggered between 22 and 
32 bp to detect the hyl 28 bp deletion, and between 10 
and 12 bp to detect the 4 bp max2 deletion (Figure 5C), 
using the same threshold of 3 consecutive probes with a 
deletion profile. However, if a less stringent threshold of 
just 2 consecutive probes was used, the 28 bp deletion 



was detectable using staggered probes of 35 to 47 bp, 
and likewise the 4 bp would be detectable using stag- 
gered probe resolutions of 15 to 25 bp. 

Previous studies have shown that one or two probes 
are sufficient to detect genomic deletions of -100 bp [4] 
and -500 bp [11], deletions that are considerably larger 
than the smallest deletion (4 bp) analyzed in our study. 
Our analyses of the standard' probes on the NimbleGen 
arrays showed they were highly sensitive and highly reli- 
able in detecting deletions of various sizes. For example, 
the 104 bp phyB deletion (El 24 mutant) was repre- 
sented on our custom microarrays by three probes at 
the standard' staggering of 49 bp (Table 1, Figure 5 A 
and Additional file 1: Figure S5), while the smaller 28 bp 
and 4 bp deletions in the hyl (E99) and max2 (E207) 
mutants were each represented just once at the same 
standard' staggering, respectively. However, all five of 
these standard' probes that represented these three differ- 
ent sized deletions gave normalized DNA hybridization 
log2 ratio values of less than -0.4 (Figures 5B and C, and 
Additional file 1: Figures S6 and S7). This shows the excel- 
lent sensitivity of this type of microarray, but the detection 
of a genomic deletion with just a single probe risks back- 
ground signal-to-noise interference, with the persistent 
possibility of false positives being called. 

Overall, the ultra-high density' probe sets used in our 
study enabled us to determine the number of probes re- 
quired and at the precise staggering (between 2 and 
49 bp) that allow reliable detection of deletions of just a 
few bp in size. These results will aid researchers design- 
ing microarrays to detect deletions of different sizes 
without any a priori knowledge of the deletion. For 
example, using a stringent 3 consecutive probe criteria, 
deletions of -100 bp require microarrays with probes 
staggered < 45 bp while smaller deletions of -30 bp need 
probes staggered < 32 bp and deletions of a just a few bp 
require probes staggered < 12 bp (Figures 5A-C). 

NimbleGen microarray performance analysis 

A previous study used an additional criterion to predict 
deletions in rice plants using microarrays based on a 
combination of probe threshold and probe proportions 
[25]. Bruce et al, (2009) called putative deletions when > 
50% of the probes representing a gene model had probe 
log2 ratio intensities (mutant probe intensity/wild type 
probe intensity) of < -0.8. In our study, all nineteen 
ultra-high density' probe sets (2, 6, 10, 12, 15, 17, 20, 22, 
25, 27, 30, 32, 35, 37, 40, 42, 45, 47 and 49 bp) used to 
detect the 4 bp, 28 bp and 104 bp deletions had > 50% of 
probes with log2 ratio intensity ratios below the two S.D. 
threshold of < -0.4, indicative of genomic deletions 
(Figure 5A-C, and Figure 6). We found the average 
number of 'detected' probes with a 'deletion profile' out 
of all nineteen 'designed' 'ultra-high density' probe sets 
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Figure 5 The design and experimental performance of probe 
sets used to detect genome deletions. (A) A graphical 
representation of tine data sliown in Table 1. The number of probes 
in each of the nineteen staggered probe sets 'designed' to cover 
the phyB deletion of 104 bp in El 24 line are shown with a sold 
black line. The numbers of 'designed' probes, that were 
experimentally 'detected' with normalized log2 intensities of less 
than -0.4 are also shown with a dashed black line. For example, in 
Figure 4B 8 of the 9 (88.9%) designed probes (staggered every 
17 bp) covering the phyB deletion had a normalized signal log2 
ratio of less than -0.4 in Figure 4B. These 17 bp staggered probe 
numbers designed and detected are shown in (A) by a black arrow. 
(B) and (C), are similar to (A), the number of probes in each of the 
nineteen staggered probe sets designed to cover the hyl deletion 
of 28 bp in line E99 and the mox2 deletion of 4 bp in line E99 are 
shown with a sold black lines and those experimentally detected 
with normalized log2 intensities of less than -0.4 are also shown 
with a dashed black lines, respectively. The staggered probe data for 
the mox2 deletion (in line E207) >30 bp are not shown but the data 
is shown in Additional file 1: Figure S8. 



were: 94%, E124; 86%, E99; and 99%, E207 (Figure 6). 
For example, 95% (19 of 20 probes) of the 2 bp stag- 
gered probes representing the smallest deletion (4 bp 
present in the E207 mutant line) had a log2 ratio nor- 
malized intensity of less than -0.4 (Figure 6). Likewise, 
84% (26 of 31) and 86% (66 of 77) of the 2 bp staggered 
probes covering the two larger deletions (28 bp and 
104 bp present in the E99 and E207 mutant lines, re- 
spectively) had a log2 ratio normalized intensity of less 
than -0.4 (Figure 6). Although, the lowest values we ob- 
served were 50% for two out of the fifty-seven ultra- 
high density probe sets (the 37 bp and 45 bp probe sets 
representing the hyl mutation in E99; Figure 6), the 
overall results suggest that most of the 'ultra-high dens- 
ity NimbleGen CGH probe sets used in our study are 
highly sensitive, and are efficient in detecting deletion of 
various sizes. 

Discussion 

Microarrays are a powerful tool in the genetic analyses 
of both prokaryotes and eukaryotes. In this study we de- 
termined the probe resolution required on microarrays 
to reliably detect genome deletions. Using irradiated A. 
thaliana plant lines that were isolated from forward gen- 
etic screens, we were able to identify mutant phenotype- 
causal deletions that ranged in size from just a few bp to 
larger kb variants. Although forward genetics has been 
used for almost 100 years, a major difficulty of this tech- 
nique is the molecular identification of the causal muta- 
tions responsible for an observed phenotype. This is 
especially true for smaller deletions of just a few nucleo- 
tides that can remain undetected. 

In order to study the appropriate density of probes on 
a microarray to detect small genomic deletions, we uti- 
lized a set of A. thaliana mutant plant lines that we had 
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Figure 6 A histogram showing the percentage of 'designed' probes that indicated a genomic deletion. The columns represent the 
percent of total probes that covered each of the deleted regions in lines El 24 (104 bp AtPHYB genie mutation), E99 (28 bp AtHYl genie mutation) 
and E207 (4 bp AtMAX2 genie mutation) that were detected as deletions. The probes that were categorized as deletion probes had normalized 
hybridization signal log2 ratios (mutant versus control) of < -0.4. 



previously characterized. These mutants, obtained from 
an irradiation mutagenesis screen, had genomic deletions 
in three photomorphogenesis genes MAX2, HYl, and 
PHYB, The deletions were 4 bp to 104 bp in size and 
made excellent test cases to optimize CGH-microarray 
based genomic deletion detection. We designed a Nimble- 
Gen deletion profiling microarray to detect these A. thali- 
ana mutations. This targeted array contained ultra-high 
density probes covering the entire genomic regions of the 
three genes affected and included exon, intron, promoter 
and UTR sequences. The probes were added to the micro- 
array in nineteen physically staggered designs to cover 
each region from 2 bp to 49 bp. Following normalization 
of the hybridization signals of the mutants versus control 
samples, we could perform in-depth analysis of the probe 
densities required to confidently detect relatively small 
genomic deletions. 

We found that to detect genomic deletions of -100 bp, 
a microarray with probes staggered every 45 bp or less 
was required (Figure 5A). We also found that smaller de- 
letions of 28 bp and 4 bp required higher densities of 
probes of at least 22 bp and 10 bp, respectively (Figure 5B 
and C). It is important to note that the long oligonucleo- 
tide probes (60-80 mer) on the NimbleGen microarrays 
we used in this study provided excellent detection sensi- 
tivity and reliability (see Table 1, Figures 5A-C and 
Figure 6). Microarrays from alternative manufacturers 
with shorter probes (e.g. 25 or 30 mer) might not have de- 
tected similar deletions because they have been shown to 
be less sensitive than longer probes [26,27]. 

Although most microarray applications are research- 
use-only, this technology is increasingly being used in 
clinical based genomic applications [28]. For example, 
microarrays have been used to identify DNA deletions 



associated with serious human conditions such as cancer 
[29], muscular dystrophy [30], and osteoporosis [31]. In 
addition, personalized healthcare is a rapidly developing 
scientific area and microarrays may be utilised in the 
identification and early detection of treatable diseases 
[28]. The deletions reported in these previous clinical 
studies vary in size from lOO's of bases to lOO's of kilo- 
bases. While these studies demonstrate that arrays are 
an excellent tool for identifying DNA deletions, the dens- 
ity of oligonucleotides required to detect smaller disease- 
causing deletions remains unclear. 

With the advent of NGS it is now possible for ge- 
nomes to be sequenced at high depth. However, at the 
present time no single platform, neither NGS nor micro- 
arrays, can identify all genomic sequence variants [32]. 
NGS can be very expensive, computational costs can be 
high and it can be time prohibitive for large numbers of 
samples [33]. Indeed, whole-genome sequencing is not 
necessary for many research studies that focus on specific 
target regions, such as promoters, exons and regulatory el- 
ements. This is also true for genomes with highly repeti- 
tive DNA sequences, such as the human and wheat 
genomes that are composed of 50% and 80% repeats, 
respectively [34,35]. For example, the 50 Mbs of the ex- 
tended human exome could easily be represented on a 
single array [36]. Microarrays are a less labour intensive 
alternative to NGS and are likely to remain popular in re- 
search laboratories especially for those without extensive 
bioinformatic support. Also, microarrays can provide 
rapid and parallel analysis of large numbers of samples. 
Oxford Gene Technology, for example, processed 20,000 
samples in 20 weeks on behalf of The Wellcome Trust 
Case Control Consortium researching CNVs in common 
human diseases [32]. 
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Conclusion 

The small deletions analysed in this report are typical of 
those found in plants following irradiation mutagenesis 
[13] and in certain human disorders: frameshift causing 
deletions of 4 bp have been reported to be responsible 
for severe brain malformations [37]; a condition in fam- 
ilies that can cause the lung to collapse [38]; and in a 
syndrome that can lead to increased risk of cancer of the 
colon, stomach, and pancreas [39]. Our findings show 
that the probe density on a microarray is critical in iden- 
tif)^ing genomic deletions and is fundamental to the 
success of experiments. Our results will help researchers 
working on both prokaryotes and eukaryotes to design 
microarrays with the optimal probe densities to detect 
both large and small deletions. These findings are applic- 
able to any organism with a well-annotated sequenced 
reference genome. 

Methods 

Plant material and growth conditions 

All experiments used either the Landsberg erecta (Ler) 
or the Columbia (Col-0) laboratory strains of A. thaliana 
as genetic background, gal-3 seeds were originally iso- 
lated from a FN-mutagenized Ler plant [40,41] that had 
a genomic deletion encompassing part of the GAl 
gibberellin biosynthesis gene (A^4G02780). Seeds of the 
Col-0 AtHKTl mutant (FN1148) were kindly donated by 
Julian Schroeder, UC San Diego, USA [11]. Plants were 
grown on soil with a 16 h light/8 h dark photoperiod at 
22-24°C (irradiance 120 (imol photons m"^ s"^). 

Three Arabidopsis elongated hypocotyl mutants E99, 
E124 and E207 were obtained from visual screens of seed- 
lings grown from a fast neutron irradiated DELLA defi- 
cient (gai-t6, rga-t2, rgll-1, rgl2-l and rgl3-4) [42] seed 
collection [13]. The plants harbour genomic deletions 
within the HYl {At2G26670) [40], PHYB (A^2G18790) 
[43] and MAX2 (A^2G42620) [44] genes, respectively. The 
genomic deletion in each FN elongated hypocotyl mutant 
was confirmed by direct PCR amplification and Sanger se- 
quencing using the oligonucleotides listed in Additional 
file 1: Table S3. Table SI (see Additional file 1) gives an 
overview of the five Arabidopsis mutants used in this 
study, the sizes of the fast neutron-induced deletions, the 
coordinates of the deletions, and the gene affected. 

DNA extraction and microarray experiment 

Genomic DNA was extracted from plant leaf material 
using a DNeasy Mini Kit (Qiagen). The yield and quality 
of the samples was checked by running the samples on a 
1% agarose gel (data not shown), and 6 [ig was sent to 
Roche NimbleGens custom microarray services facility. 
Microarray hybridizations and washings were performed 
using the standard NimbleGen protocol for CGH analyses 
(http://www.nimblegen.com/products/cgh/index.html). 



Control DNA samples were extracted from A. thali- 
ana Col-0 and used to normalize the FN1148 mutant 
hybridization signal; wild type A, thaliana Ler DNA was 
used to normalize the gal-3 mutant hybridization sig- 
nal; and the DNA of the DELLA-deficient progenitor 
line (mostly A, thaliana Ler background with a ~3 Mb 
chromosome 5 Col-0 segment [13] was used as the 
control to normalize the E99, E124 and E207 mutant 
hybridization signals. 

Microarrays 

We used Roche NimbleGens two-colour Arabidopsis 
CGH 3 X 720 K whole genome custom tiling arrays that 
feature empirically tested probes of 50-75 mer that 
provide improved data quality (i.e. signal-to-noise) com- 
pared with computationally selected probes (http://www. 
nimblegen.com/products/cgh). Previous studies have 
shown that microarrays with longer oligonucleotides 
(60-80 mer) provide significantly better detection sensi- 
tivity than those with shorter oligonucleotides (e.g. 25 or 
30 mer) [26,27,45]. 

Using the A, thaliana TAIR8 annotated reference gen- 
ome sequence (www.arabidopsis.org), probes were de- 
signed to represent the complete model plant genome 
every 49 bp with 50-75 bp partially overlapping probes, 
i.e. the distance between the 5' ends of consecutive 
oligonucleotides. In addition to these standard' array 
probes staggered every 49 bp, extra probes were added 
with the purpose of representing the five genes AtGAl, 
AtHKTl, AtPHYB, AtHYl and AtMAX2 at ultra-high 
density'. To achieve this purpose, around 15,000 isother- 
mal 'ultra-high density' probes (50-75 bp; see Additional 
file 1: Table S2) were designed to cover the genomic re- 
gions of each of the five genes from 500 bp 5' to the 
gene's start codon, continuing through the genie region 
(including both exonic and intronic sequences) and end- 
ing 500 bp 3' of the stop codon. 

Probes were designed to represent the larger deletions 
in the FN1148 (--500 bp AtHKTl genie mutation) and 
gal-3 (~5 kb AtGAl genie mutation) Arabidopsis mu- 
tants with oligos staggered every 6 bp, while the extra 
probes representing the smaller deletions in the El 24 
(104 bp AtPHYB genie mutation), E99 (28 bp AtHYl 
genie mutation) and E207 (4 bp AtMAX2 genie muta- 
tion) Arabidopsis mutants were staggered every: 2, 6, 10, 
12, 15, 17, 20, 22, 25, 27, 30, 32, 35, 37, 40, 42, 45, 47 
and 49 bp (Additional file 1: Figure SI; Additional file 2: 
Tables S4-S22). To reduce the background noise of the 
'ultra-high density' probes, probes that mapped perfectly 
to more than one genomic location were excluded. Also, 
sequences that were unique to the Arabidopsis TAIR8 
annotated reference genome were assigned randomized 
locations across the NimbleGen array. 
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Microarray hybridizations and data analysis 

Test and reference control genomic DNAs were inde- 
pendently labelled with fluorescent dyes (mutant DNA 
was labelled with cyanine (Cy) 3 and control DNA labelled 
with Cy5), co-hybridized to the NimbleGen A. thaliana 
2.1 M Whole-Genome CGH arrays, and scanned using a 
5 (im scanner. Log2-ratios of the probe signal intensities 
(Cy3/Cy5) were calculated and plotted versus genomic 
position using Roche NimbleGen NimbleScan software. 
To determine the efficiency and to reduce costs associated 
with CGH-based deletion discovery, we used a single array 
per deletion experiment without replicate hybridizations. 

The criteria used to identify deletions and CNVs by 
aCGH vary considerably between studies. Such muta- 
tions are commonly distinguished from low-level gains/ 
losses using a direct threshold of array data. However, 
the threshold value often differs greatly, ranging from a 
log2 ratio of +/-0.4 for some studies [46,47] to as high 
as +/-1.0 for others [48,49]. The criteria we used to 
detect deletions were based on the aCGH patterns ob- 
tained from our mutant versus wild type hybridizations. 
From our El 24, E99 and E207 versus control hybridiza- 
tions, probes that represented loci with an equal copy 
number, had a mean log2 normalized intensity ratios of 
0.0 +/-0.2 S.D. Based on this variation, empirical ana- 
lyses (based on looking at the frequency distribution of 
log ratios for probes along known deleted and duplicated 
regions), and previous studies [50,51], we chose a probe 
'deletion profile' threshold of 2 x S.D., i.e. +/-0.4. In- 
deed, of the total number probes on the microarrays 
only 2.73 - 4.97% from each mutant versus control 
hybridization gave log2 ratios above or below +/-0.4, 
suggesting the probes on the microarray had a low level 
of signal-to-noise. In addition, we would expect about 
5% of probes to exceed this threshold by chance if the 
log2 ratios are normally distributed. 

Availability of supporting data 

The data sets supporting the results of this article are 
available in the NCBI GEO repository (accession num- 
ber GSE55327 study at: http://www.ncbi.nlm.nih.gov/ 
geo/query/acc.cgi?acc=GSE55327). 

Additional files 



E207 mutant detected with NimbleGen CGH arrays. Figure S8. Design 
and experimental performance of the NimbleGen CGH array probes. 
Table SI. A table showing the details of known deletions in five 
Arabidopsis mutants used in this study. Table S2. The table lists the 
numbers of custom array probes staggered from 2 bp to 49 bp 
representing the genes in five Arabidopsis deletion mutants used in this 
study. Table S3. List of genomic DNA deletion mutations identified and 
verified in three fast-neutron irradiated mutants. 

Additional file 2: Tables S4-S22. List and design information for 
custom probes with 2 to 49 bp staggerings. 
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